We will need the package ggplot2 :
- Check that
ggplot2is installed - If not, install it, then load it
library(ggplot2)
We also need the “fruits” data:
data("fruits", package = "debuter")
Nov. 25
We will need the package ggplot2 :
ggplot2 is installedlibrary(ggplot2)
We also need the “fruits” data:
data("fruits", package = "debuter")
From Data to Viz : https://www.data-to-viz.com/
LThe base function for bar plots is barplot :
barplot(table(fruits$groupe))
With colors:
barplot(table(fruits$groupe), col = 1:4)
ggplot(data = fruits, aes(x = groupe, fill = groupe)) + geom_bar()
STOP !
ggplot : create an exmpry canvasaes : declare aesthetic parameter (position, color, width, shape, opacity, etc…)geom_bar : use a geometry| Data `da | ta` Th | e data used to create the graph. Each line represents an object to add to the graph. |
| Geometry ` | geom_` | How to represent the objects: point, lines, surfaces etc. |
| Aesthetics | aes() |
Aesthetic parameters of the shapes: position, color, shape, size etc. |
| Scale `sc | ale_` Fu | nctions used to parameter how the shapes are created from the objects and the aesthetic parameters. For example the function scale_color_manual allows the users to pick their own colors. |
Reproduce the graph on the right:
ggplot(***,
aes(***,
fill = Sucres > 10)) +
geom_***()
ggplot“1” (see ici)
Hadley Wickham
Nous allons voir ensemble quelques géométries particulières qui permettent de créer des graphes classiques.
geom_bar |
Bar plot on non-aggregated data |
geom_col |
Bar plot on existing counts |
geom_histogram |
Histogram of a quantitative variable |
geom_boxplot |
Tukey diagram aka boxplot |
geom_violin |
“Violin” plot |
geom_point |
Scatter plot |
geom_line |
Line plot |
We already know how to do it:
ggplot(fruits, aes(cut(Eau, c(0, 84.2, 100)))) + geom_bar(fill = "steelblue")
When you already have counts.
dat.count <- data.frame(
Fruit = c("Ananas", "Durian"),
Nb = c(10, 20)
)
ggplot(data = dat.count, aes(x = Fruit, y = Nb)) +
geom_col()
Add colors to the previous bar plot!
ggplot(fruits, aes(Sucres)) + geom_bar()
ggplot(fruits, aes(Sucres)) + geom_histogram()
To plot counts for :
To plot counts or densities for:
In this case, it is very important to choose the intervals!
ggplot(fruits, aes(Sucres)) + geom_histogram()
To create a histogram, one needs to distributes values into classes.
hist does it automatically with an algorithm (Sturges by default, but the user can use Scott, or Friedman-Diaconis algorithms). If n is specified, the function will choose a close value for n that gives pretty intervals. To force the classes, use breaks. *geom_histogram create 30 classes by default, it is the user’s job to specify their classes or the number of classes they want.ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5))
ggplot(fruits, aes(Sucres)) +
geom_histogram(breaks = seq(0, 75, 5),
fill = "steelblue")
ggplot(fruits, aes(Sucres)) +
geom_histogram(breaks = seq(0, 75, 5),
fill = "steelblue",
color = "white")
ggplot(data=fruits, aes(x = Sucres)) + geom_boxplot()
ggplot(data=fruits, aes(x=groupe, y=Sucres)) + geom_boxplot()
ggplot(data=fruits,
aes(x = Sucres, y = 1)) +
geom_violin()
ggplot(data=fruits,
aes(x = groupe, y = Sucres)) +
geom_violin()
Complete the code to obtain the graph on the right:
ggplot(fruits,
aes(x = Fibres > 1.5,
y = Proteines,
fill = ***)) +
geom_***()
Themes are pre-defined functions that change the appearance of ggplots:
Examples (theme_***()) :
theme_bw() for a black and white theme,theme_minimal() for a minimalist theme,theme_void() for an empty themetheme_bw()ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_bw()
theme_minimal()ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_minimal()
theme_void()ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_void()
theme_bw with the command ?theme_bwggplot(fruits, aes(y = Fibres)) + geom_boxplot() + theme_***()
ggtitlexlabylabUse the wrapper function labs to go even faster:
labs( title = "Titre du graphe", subtitle = "Sous-titre du graphe", x = "Titre de l'axe des x", y = "Titre de l'axe des y", color = "Titre de la légende des couleurs", shape = "Titre de la légende des formes" )
With the function theme(): each element has to be defined according to its nature.
element_text(size=, colour = "", family = "") (e.g. titles)element_line(colour=“”, size=) (e.g. major and minor grids)element_rect(fill = "") (e.g.: background)theme()axis.title, axis.title.x, axis.title.y : size, font, color, …axis.text, axis.text.x, axis.text.y : size, font, color, …axis.ticks, axis.ticks.x, axis.ticks.yaxis.line, axis.line.x, axis.line.ypanel.background : colorpanel.grid.major, panel.grid.minor : color, sizelegend.text: size, font, colorlegend.positionplot.title : size, font, colorgeom_pointThis geometry needs \(x\) et \(y\) aesthetic parameters, and will accept optionnally size, color and shape.
ggplot(fruits, aes(x = Phosphore, y = Calcium, size = Magnesium)) + geom_point()
When they are specified in aes, they apply values (from the dataset) to a characteristic of the objects that are drawn on the graph.
color or colour : color (of the point)fill : color (inside a shape)size : sizeshape : shapealpha : opacitylinetype : type of linelabel : labelsSpecified outside of aes(), they behave in a more general way!
ggplot(fruits,
aes(x = Phosphore, y = Calcium,
color = Magnesium)) +
geom_point() +
theme(legend.position = "bottom")
ggplot(fruits,
aes(x = Phosphore, y = Calcium)) +
geom_point(color = "limegreen")
Complete the code to obtain the graph on the right:
ggplot(fruits,
aes(x = Sucres,
y = Proteines,
*** = Magnesium,
*** = ***)) +
geom_***() +
***(title = "Fruits",
x = "Sucres (g/100 g)",
y = "Protéines, N x 6.25 (g/100 g)",
size = "Magnésium\n(mg/100 g)",
***= "Groupe") +
theme_***()
Don’t panick, use opacity (aka alpha) :
ggplot(fruits,
aes(x = Phosphore,
y = Calcium,
color = groupe)) +
geom_point(alpha = 0.5,
size = 2) +
theme_bw() +
theme(legend.position =
"bottom")
scale_*** functionsThey allow the use to customize a scale (in \(x\) or \(y\) but not only)!
scale_x_log10() changes the \(x\) scale to a logarithmic scale,scale_y_log10() changes the \(y\) scale to a logarithmic scale,scale_color_manual() customizes the colors,scale_fill_manual() cutomizes the colors inside shapes,scale_x_continuous() customizes the \(x\) scale for a continuous variable,scale_y_continuous() customizes the \(y\) scale for a continuous variable,scale_x_discrete() customizes the \(x\) scale for a discrete variable,,scale_y_discrete() customizes the \(y\) scale for a discrete variable,,Complete the code to obtain the graph on the right:
ggplot(fruits,
aes(Phosphore,
Calcium)) +
geom_point(*** = "white") +
scale_***() +
scale_***() +
labs(x = "log10(Phosphore)",
y = "log10(Calcium)") +
theme_dark()
coord_*** functionsThey allow the user to change the coordinate system after applying all the scaling transformations (with scale_*** functions). For example:
coord_fixed to fix the ratio between the units on the \(y\) axis and the units on the \(x\) axis,coord_equal when the ratio is set to 1,coord_flip to flip the axes,coord_polar to get a plot in the polar coordinate system.*lim* functionsThat allow the users to specify the limits (minimum and maximum) on a specified axis. Caution: the values outside are eliminated from the graph!
xlim, ylim or lims to change ghe range,expand_limits to extend the range.To “zoom in” without loosing data, use coord_cartesian or scale_***
facet_wrapUsed to divide the graph into panels.
Careful about the syntax: it is based on vars.
To divide a graphe g into several panels according to the value of a factor fac:
g + facet_wrap(facets = vars(fac))
One can also use a “formula” :
g + facet_wrap(~ fac)
ggplot(fruits,
aes(x = Phosphore,
y = Calcium,
color = groupe)) +
geom_point() +
facet_wrap(vars(Sucres > 10)) +
theme_bw() +
theme(legend.position =
"bottom")
facet_gridThat is used the same way as facet_wrap.
To divide a graphe g into several panels according to the value of a factor factorow for the lines and factocol for the columns:
g + facet_grid(rows = vars(factorow), cols = vars(factocol))
One can also use a “formula” :
g + facet_grid(factorow ~ factocol)
A PIECE OF ADVICE: when using facetting, be careful about the levels of the categorical variables that your are going to use.
Use and example:
g <- ggplot(fruits, aes(groupe)) + geom_bar() ggsave(filename = "mongraphe.png", plot = g)
The extension given in filename will be magically used to save the graph in the correct format!
gplot2 is very complete :